Rotella PBMC

Overview:

This page contains the results of CoNGA analyses. Results in tables may have been filtered to reduce redundancy, focus on the most important columns, and limit length; full tables should exist as OUTFILE_PREFIX*.tsv files.

Command:

scripts/run_conga.py --all --gex_data /scratch.global/ben_testing/ben_tcr/Rotelle_PBMC/outs/filtered_feature_bc_matrix.h5 --gex_data_type 10x_h5 --clones_file Rotelle_PBMC_TCR --organism human --outfile_prefix Rotelle_PBMC_Final2

Stats

num_cells_w_gex: 8239
num_features_start: 26530
num_cells_w_tcr: 869
min_genes_per_cell: 200
max_genes_per_cell: 3500
max_percent_mito: 0.1
num_filt_max_genes_per_cell: 13
num_filt_max_percent_mito: 0
num_antibody_features: 0
num_TR_genes: 42
num_TR_genes_in_hvg_set: 41
num_highly_variable_genes: 1760
num_cells_after_filtering: 856
num_clonotypes: 813
max_clonotype_size: 10
num_singleton_clonotypes: 790
nbr_frac_for_nndists: 0.1
num_gvg_hit_clonotypes: 45
num_gvg_hit_biclusters: 2

graph_vs_graph_stats


Here we are assessing overall graph-vs-graph correlation by looking at the shared edges between TCR and GEX neighbor graphs and comparing that observed number to the number we would expect if the graphs were completely uncorrelated. Our null model for uncorrelated graphs is to take the vertices of one graph and randomly renumber them (permute their labels). We compare the observed overlap to that expected under this null model by computing a Z-score, either by permuting one of the graph's vertices many times to get a mean and standard deviation of the overlap distribution, or, for large graphs where this is time consuming, by using a regression model for the standard deviation. The different rows of this table correspond to the different graph-graph comparisons that we make in the conga graph-vs-graph analysis: we compare K-nearest-neighbor graphs for GEX and TCR at different K values ("nbr_frac" aka neighbor-fraction, which reports K as a fraction of the total number of clonotypes) to each other and to GEX and TCR "cluster" graphs in which each clonotype is connected to all the other clonotypes with the same (GEX or TCR) cluster assignment. For two K values (the default), this gives 2*3=6 comparisons: GEX KNN graph vs TCR KNN graph, GEX cluster graph vs TCR KNN graph, and GEX KNN graph vs TCR cluster graph, for each of the two K values (aka nbr_fracs).

The column to look at is *overlap_zscore*. Higher values indicate more significant GEX/TCR covariation, with "interesting" levels starting around zscores of 3-5.

Columns in more detail:

graph_overlap_type: KNN ("nbr") or cluster versus KNN ("nbr") or cluster

nbr_frac: the K value for the KNN graph, as a fraction of total clonotypes

overlap: the observed overlap (number of shared edges) between GEX and TCR graphs

expected_overlap: the expected overlap under a shuffled null model.

overlap_zscore: a Z-score for the observed overlap computed by subtracting the expected overlap and dividing by the standard deviation estimated from shuffling.
overlap expected_overlap overlap_mean overlap_sdev overlap_zscore overlap_zscore_fitted overlap_zscore_source nodes calculation_time calculation_time_fitted gex_edges tcr_edges gex_indegree_variance gex_indegree_skewness gex_indegree_kurtosis tcr_indegree_variance tcr_indegree_skewness tcr_indegree_kurtosis indegree_correlation_R indegree_correlation_P nbr_frac graph_overlap_type
96 64.078818 63.99 7.698695 4.157848 4.704840 shuffling 813 0.085029 0.004109 6504 6504 1.495497 2.251494 6.149980 0.330434 1.498027 4.443742 0.000575 0.986943 0.01 gex_nbr_vs_tcr_nbr
733 631.921182 630.60 27.549955 3.716885 3.743635 shuffling 813 0.530271 0.044851 6504 64140 1.495497 2.251494 6.149980 0.149363 0.157461 -1.316945 -0.022790 0.516408 0.01 gex_nbr_vs_tcr_cluster
1619 1404.807882 1406.93 39.195983 5.410503 7.545624 shuffling 813 1.222494 0.103308 142588 6504 0.093190 -1.039231 1.181320 0.330434 1.498027 4.443742 0.037734 0.282530 0.01 gex_cluster_vs_tcr_nbr
7124 6569.080049 6578.00 137.791146 3.962519 3.901986 shuffling 813 0.681182 0.424426 65853 65853 0.911803 1.164565 0.882743 0.289807 1.948110 5.447132 -0.020425 0.560874 0.10 gex_nbr_vs_tcr_nbr
6807 6398.201970 6412.02 96.273463 4.102688 3.462992 shuffling 813 0.625387 0.412902 65853 64140 0.911803 1.164565 0.882743 0.149363 0.157461 -1.316945 -0.032487 0.354894 0.10 gex_nbr_vs_tcr_cluster
15297 14223.679803 14216.26 143.509346 7.530799 7.886804 shuffling 813 1.322048 0.951056 142588 65853 0.093190 -1.039231 1.181320 0.289807 1.948110 5.447132 0.020606 0.557404 0.10 gex_cluster_vs_tcr_nbr

graph_vs_graph


Graph vs graph analysis looks for correlation between GEX and TCR space by finding statistically significant overlap between two similarity graphs, one defined by GEX similarity and one by TCR sequence similarity.

Overlap is defined one node (clonotype) at a time by looking for overlap between that node's neighbors in the GEX graph and its neighbors in the TCR graph. The null model is that the two neighbor sets are chosen independently at random.

CoNGA looks at two kinds of graphs: K nearest neighbor (KNN) graphs, where K = neighborhood size is specified as a fraction of the number of clonotypes (defaults for K are 0.01 and 0.1), and cluster graphs, where each clonotype is connected to all the other clonotypes in the same (GEX or TCR) cluster. Overlaps are computed 3 ways (GEX KNN vs TCR KNN, GEX KNN vs TCR cluster, and GEX cluster vs TCR KNN), for each of the K values (called nbr_fracs short for neighbor fractions).

Columns (depend slightly on whether hit is KNN v KNN or KNN v cluster): conga_score = P value for GEX/TCR overlap * number of clonotypes mait_fraction = fraction of the overlap made up of 'invariant' T cells num_neighbors* = size of neighborhood (K) cluster_size = size of cluster (for KNN v cluster graph overlaps) clone_index = 0-index of clonotype in adata object


conga_score num_neighbors_tcr cluster_size overlap overlap_corrected mait_fraction clone_index nbr_frac graph_overlap_type num_neighbors_gex gex_cluster tcr_cluster va ja cdr3a vb jb cdr3b
0.000718 NaN 34.0 14 14 0.0 569 0.10 gex_nbr_vs_tcr_cluster 81.0 2 11 TRAV4*01 TRAJ30*01 CLVGSHDKIIF TRBV12-2*01 TRBJ1-3*01 CASSSGENSGNTVYF
0.001052 81.0 162.0 34 34 0.0 557 0.10 gex_cluster_vs_tcr_nbr NaN 2 11 TRAV4*01 TRAJ22*01 CLVGERSGWQLTF TRBV6-2*01 TRBJ2-1*01 CASTGTGEYNEQFF
0.001052 81.0 162.0 34 34 0.0 567 0.10 gex_cluster_vs_tcr_nbr NaN 2 11 TRAV4*01 TRAJ3*01 CLVGDRDSSASKIIF TRBV6-3*01 TRBJ2-5*01 CASSYRPQETQYF
0.003564 81.0 162.0 33 33 0.0 566 0.10 gex_cluster_vs_tcr_nbr NaN 2 11 TRAV4*01 TRAJ29*01 CLVGDWNSGNRALVF TRBV5-6*01 TRBJ2-1*01 CASSFSGGSLDEQFF
0.005104 NaN 34.0 13 13 0.0 551 0.10 gex_nbr_vs_tcr_cluster 81.0 2 11 TRAV4*01 TRAJ13*01 CLVGDLSYQKVTF TRBV25-1*01 TRBJ1-1*01 CASAVRDAMNTEAFF
0.005104 NaN 34.0 13 13 0.0 571 0.10 gex_nbr_vs_tcr_cluster 81.0 2 11 TRAV4*01 TRAJ32*01 CLVVGGSGNKLIF TRBV27*01 TRBJ1-5*01 CASSSGTDNQPQYF
0.011398 81.0 162.0 32 32 0.0 261 0.10 gex_cluster_vs_tcr_nbr NaN 2 4 TRAV19*01 TRAJ22*01 CALNGGGISDSGWQLTF TRBV6-3*01 TRBJ2-4*01 CASSYHRDKNTQYF
0.011398 81.0 162.0 32 32 0.0 255 0.10 gex_cluster_vs_tcr_nbr NaN 2 4 TRAV19*01 TRAJ12*01 CALNTDSDYKLIF TRBV6-3*01 TRBJ2-3*01 CASRLETGDRADPQYF
0.011398 81.0 162.0 32 32 0.0 553 0.10 gex_cluster_vs_tcr_nbr NaN 2 11 TRAV4*01 TRAJ20*01 CLVGDTNYKLSF TRBV3-2*01 TRBJ2-1*01 CASSQSMGDIYNEQFF
0.011398 81.0 162.0 32 32 0.0 555 0.10 gex_cluster_vs_tcr_nbr NaN 2 11 TRAV4*01 TRAJ21*01 CLVGEGNFNKFYF TRBV13*01 TRBJ1-6*01 CASSSQAGSPLYF
0.011398 81.0 162.0 32 32 0.0 584 0.10 gex_cluster_vs_tcr_nbr NaN 2 11 TRAV4*01 TRAJ5*01 CLVGDISAGRRALTF TRBV23-1*01 TRBJ1-4*01 CASSQHTTGDNEKLFF
0.011398 81.0 162.0 32 32 0.0 281 0.10 gex_cluster_vs_tcr_nbr NaN 2 4 TRAV19*01 TRAJ37*01 CALTGKLIF TRBV18*01 TRBJ2-3*01 CASSPPQQGDLTDPQYF
0.013654 NaN 71.0 19 19 0.0 257 0.10 gex_nbr_vs_tcr_cluster 81.0 2 4 TRAV19*01 TRAJ12*01 CAMDSDYKLIF TRBV21-1*01 TRBJ2-7*01 CASSNTGDYEQYF
0.029389 81.0 NaN 20 20 0.0 566 0.10 gex_nbr_vs_tcr_nbr 81.0 2 11 TRAV4*01 TRAJ29*01 CLVGDWNSGNRALVF TRBV5-6*01 TRBJ2-1*01 CASSFSGGSLDEQFF
0.029389 81.0 NaN 20 20 0.0 551 0.10 gex_nbr_vs_tcr_nbr 81.0 2 11 TRAV4*01 TRAJ13*01 CLVGDLSYQKVTF TRBV25-1*01 TRBJ1-1*01 CASAVRDAMNTEAFF
0.031772 NaN 34.0 12 12 0.0 570 0.10 gex_nbr_vs_tcr_cluster 81.0 2 11 TRAV4*01 TRAJ32*01 CLVGDSGSGNKLIF TRBV7-6*01 TRBJ2-7*01 CASSPGLVRTYEQYF
0.031772 NaN 34.0 12 12 0.0 581 0.10 gex_nbr_vs_tcr_cluster 81.0 2 11 TRAV4*01 TRAJ40*01 CLVGDMPGNYKYIF TRBV20-1*01 TRBJ2-1*01 CSVHWEGKDNEQFF
0.031772 NaN 34.0 12 12 0.0 559 0.10 gex_nbr_vs_tcr_cluster 81.0 2 11 TRAV4*01 TRAJ23*01 CLVGGPAYNQAGKLIF TRBV23-1*01 TRBJ2-1*01 CASGTGNNEQFF
0.031772 NaN 34.0 12 12 0.0 561 0.10 gex_nbr_vs_tcr_cluster 81.0 2 11 TRAV4*01 TRAJ27*01 CLVGDGGNADKLTF TRBV7-4*01 TRBJ2-7*01 CASSIGNEQYF
0.054196 NaN 71.0 18 18 0.0 252 0.10 gex_nbr_vs_tcr_cluster 81.0 2 4 TRAV19*01 TRAJ11*01 CALNHNSGYSTLTF TRBV9*01 TRBJ1-5*01 CASSLVGDDNQPQYF
0.054196 NaN 71.0 18 18 0.0 286 0.10 gex_nbr_vs_tcr_cluster 81.0 2 4 TRAV19*01 TRAJ41*01 CALNEAGSNSGYALNF TRBV7-4*01 TRBJ2-7*01 CASTAGLSYEQYF
0.061230 8.0 162.0 7 7 0.0 573 0.01 gex_cluster_vs_tcr_nbr NaN 2 11 TRAV4*01 TRAJ33*01 CLVGDMGSNYQLIW TRBV11-1*01 TRBJ1-1*01 CASSLGGRMNTEAFF
0.097953 81.0 162.0 30 30 0.0 570 0.10 gex_cluster_vs_tcr_nbr NaN 2 11 TRAV4*01 TRAJ32*01 CLVGDSGSGNKLIF TRBV7-6*01 TRBJ2-7*01 CASSPGLVRTYEQYF
0.097953 81.0 162.0 30 30 0.0 573 0.10 gex_cluster_vs_tcr_nbr NaN 2 11 TRAV4*01 TRAJ33*01 CLVGDMGSNYQLIW TRBV11-1*01 TRBJ1-1*01 CASSLGGRMNTEAFF
0.097953 81.0 162.0 30 30 0.0 550 0.10 gex_cluster_vs_tcr_nbr NaN 2 11 TRAV4*01 TRAJ13*01 CLVALSGSYQKVTF TRBV12-2*01 TRBJ2-7*01 CASSLRTGGSPEQYF
0.097953 81.0 162.0 30 30 0.0 292 0.10 gex_cluster_vs_tcr_nbr NaN 2 4 TRAV19*01 TRAJ48*01 CALISLFGNEKLTF TRBV4-3*01 TRBJ2-3*01 CASSQGEGVTDPQYF
0.097953 81.0 162.0 30 30 0.0 558 0.10 gex_cluster_vs_tcr_nbr NaN 2 11 TRAV4*01 TRAJ22*01 CLVPSDSGWQLTF TRBV5-6*01 TRBJ1-2*01 CASSLQGAGYDYTF
0.097953 81.0 162.0 30 30 0.0 287 0.10 gex_cluster_vs_tcr_nbr NaN 2 4 TRAV19*01 TRAJ42*01 CALARRRGSRGNLIF TRBV11-1*01 TRBJ1-1*01 CASSFNREGENTEAFF
0.097953 81.0 162.0 30 30 0.0 304 0.10 gex_cluster_vs_tcr_nbr NaN 2 4 TRAV19*01 TRAJ56*01 CALNDPTGANNKLTF TRBV24-1*01 TRBJ2-1*01 CATSEERGTGPYNEQFF
0.097953 81.0 162.0 30 30 0.0 249 0.10 gex_cluster_vs_tcr_nbr NaN 2 4 TRAV19*01 TRAJ10*01 CALNEAWLMGGGNKLTF TRBV15*01 TRBJ2-6*01 CASSKEVGGEGGSVLTF
0.105472 81.0 NaN 19 19 0.0 252 0.10 gex_nbr_vs_tcr_nbr 81.0 2 4 TRAV19*01 TRAJ11*01 CALNHNSGYSTLTF TRBV9*01 TRBJ1-5*01 CASSLVGDDNQPQYF
0.165112 NaN 71.0 5 5 0.0 539 0.01 gex_nbr_vs_tcr_cluster 8.0 1 4 TRAV38-1*01 TRAJ56*01 CAFMKHATGANNKLTF TRBV4-2*01 TRBJ1-2*01 CASSQDEGPYTF
0.172698 NaN 34.0 11 11 0.0 566 0.10 gex_nbr_vs_tcr_cluster 81.0 2 11 TRAV4*01 TRAJ29*01 CLVGDWNSGNRALVF TRBV5-6*01 TRBJ2-1*01 CASSFSGGSLDEQFF
0.172698 NaN 34.0 11 11 0.0 558 0.10 gex_nbr_vs_tcr_cluster 81.0 2 11 TRAV4*01 TRAJ22*01 CLVPSDSGWQLTF TRBV5-6*01 TRBJ1-2*01 CASSLQGAGYDYTF
0.197360 NaN 71.0 17 17 0.0 304 0.10 gex_nbr_vs_tcr_cluster 81.0 2 4 TRAV19*01 TRAJ56*01 CALNDPTGANNKLTF TRBV24-1*01 TRBJ2-1*01 CATSEERGTGPYNEQFF
0.197360 NaN 71.0 17 17 0.0 296 0.10 gex_nbr_vs_tcr_cluster 81.0 2 4 TRAV19*01 TRAJ5*01 CALTPGAGRRALTF TRBV10-2*01 TRBJ2-1*01 CASVQDNEQFF
0.263085 81.0 162.0 29 29 0.0 576 0.10 gex_cluster_vs_tcr_nbr NaN 2 11 TRAV4*01 TRAJ36*01 CLVGDKAGVNNLFF TRBV11-3*01 TRBJ1-1*01 CASSSGQGETEAFF
0.263085 81.0 162.0 29 29 0.0 574 0.10 gex_cluster_vs_tcr_nbr NaN 2 11 TRAV4*01 TRAJ33*01 CLVGGRPDSNYQLIW TRBV28*01 TRBJ2-5*01 CASILTGLEETQYF
0.263085 81.0 162.0 29 29 0.0 565 0.10 gex_cluster_vs_tcr_nbr NaN 2 11 TRAV4*01 TRAJ28*01 CLVGPSGAGSYQLTF TRBV23-1*01 TRBJ1-1*01 CASSTRNTEAFF
0.263085 81.0 162.0 29 29 0.0 551 0.10 gex_cluster_vs_tcr_nbr NaN 2 11 TRAV4*01 TRAJ13*01 CLVGDLSYQKVTF TRBV25-1*01 TRBJ1-1*01 CASAVRDAMNTEAFF
0.263085 81.0 162.0 29 29 0.0 308 0.10 gex_cluster_vs_tcr_nbr NaN 2 4 TRAV19*01 TRAJ9*01 CALNGPNTGGFKTVF TRBV13*01 TRBJ2-7*01 CASSSGTVYEQYF
0.263085 81.0 162.0 29 29 0.0 257 0.10 gex_cluster_vs_tcr_nbr NaN 2 4 TRAV19*01 TRAJ12*01 CAMDSDYKLIF TRBV21-1*01 TRBJ2-7*01 CASSNTGDYEQYF
0.349133 81.0 NaN 18 18 0.0 550 0.10 gex_nbr_vs_tcr_nbr 81.0 2 11 TRAV4*01 TRAJ13*01 CLVALSGSYQKVTF TRBV12-2*01 TRBJ2-7*01 CASSLRTGGSPEQYF
0.349133 81.0 NaN 18 18 0.0 292 0.10 gex_nbr_vs_tcr_nbr 81.0 2 4 TRAV19*01 TRAJ48*01 CALISLFGNEKLTF TRBV4-3*01 TRBJ2-3*01 CASSQGEGVTDPQYF
0.349133 81.0 NaN 18 18 0.0 249 0.10 gex_nbr_vs_tcr_nbr 81.0 2 4 TRAV19*01 TRAJ10*01 CALNEAWLMGGGNKLTF TRBV15*01 TRBJ2-6*01 CASSKEVGGEGGSVLTF
0.349133 81.0 NaN 18 18 0.0 304 0.10 gex_nbr_vs_tcr_nbr 81.0 2 4 TRAV19*01 TRAJ56*01 CALNDPTGANNKLTF TRBV24-1*01 TRBJ2-1*01 CATSEERGTGPYNEQFF
0.658250 NaN 71.0 16 16 0.0 541 0.10 gex_nbr_vs_tcr_cluster 81.0 2 4 TRAV38-1*01 TRAJ57*01 CAFMRKGGSEKLVF TRBV13*01 TRBJ2-3*01 CASSLVGVYTDPQYF
0.658250 NaN 71.0 16 16 0.0 306 0.10 gex_nbr_vs_tcr_cluster 81.0 2 4 TRAV19*01 TRAJ58*01 CALNAQGTGGSRLTF TRBV7-6*01 TRBJ2-5*01 CASSFSLGGGDQYF
0.658250 NaN 71.0 16 16 0.0 274 0.10 gex_nbr_vs_tcr_cluster 81.0 2 4 TRAV19*01 TRAJ31*01 CALNGGNNNDRVIF TRBV12-3*01 TRBJ2-1*01 CASSEGGNNNEQFF
0.658250 NaN 71.0 16 16 0.0 262 0.10 gex_nbr_vs_tcr_cluster 81.0 4 4 TRAV19*01 TRAJ22*01 CALNLGRSGWQLTF TRBV24-1*01 TRBJ1-4*01 CATREGELGEKLFF
Omitted 27 lines

graph_vs_graph_logos


This figure summarizes the results of a CoNGA analysis that produces scores (CoNGA) and clusters. At the top are six 2D UMAP projections of clonotypes in the dataset based on GEX similarity (top left three panels) and TCR similarity (top right three panels), colored from left to right by GEX cluster assignment; CoNGA score; joint GEX:TCR cluster assignment for clonotypes with significant CoNGA scores, using a bicolored disk whose left half indicates GEX cluster and whose right half indicates TCR cluster; TCR cluster; CoNGA; GEX:TCR cluster assignments for CoNGA hits, as in the third panel.

Below are two rows of GEX landscape plots colored by (first row, left) expression of selected marker genes, (second row, left) Z-score normalized and GEX-neighborhood averaged expression of the same marker genes, and (both rows, right) TCR sequence features (see CoNGA manuscript Table S3 for TCR feature descriptions).

GEX and TCR sequence features of CoNGA hits in clusters with 5 or more hits are summarized by a series of logo-style visualizations, from left to right: differentially expressed genes (DEGs); TCR sequence logos showing the V and J gene usage and CDR3 sequences for the TCR alpha and beta chains; biased TCR sequence scores, with red indicating elevated scores and blue indicating decreased scores relative to the rest of the dataset (see CoNGA manuscript Table S3 for score definitions); GEX 'logos' for each cluster consisting of a panel of marker genes shown with red disks colored by mean expression and sized according to the fraction of cells expressing the gene (gene names are given above).

DEG and TCRseq sequence logos are scaled by the adjusted P value of the associations, with full logo height requiring a top adjusted P value below 10-6. DEGs with fold-change less than 2 are shown in gray. Each cluster is indicated by a bicolored disk colored according to GEX cluster (left half) and TCR cluster (right half). The two numbers above each disk show the number of hits within the cluster (on the left) and the total number of cells in those clonotypes (on the right). The dendrogram at the left shows similarity relationships among the clusters based on connections in the GEX and TCR neighbor graphs.

The choice of which marker genes to use for the GEX umap panels and for the cluster GEX logos can be configured using run_conga.py command line flags or arguments to the conga.plotting.make_logo_plots function.
Image source: Rotelle_PBMC_Final2_graph_vs_graph_logos.png

tcr_clumping


This table stores the results of the TCR "clumping" analysis, which looks for neighborhoods in TCR space with more TCRs than expected by chance under a simple null model of VDJ rearrangement.

For each TCR in the dataset, we count how many TCRs are within a set of fixed TCRdist radii (defaults: 24,48,72,96), and compare that number to the expected number given the size of the dataset using the poisson model. Inspired by the ALICE and TCRnet methods.

Columns: clump_type='global' unless we are optionally looking for TCR clumps within the individual GEX clusters num_nbrs = neighborhood size (number of other TCRs with TCRdist


tcr_db_match


This table stores significant matches between TCRs in adata and TCRs in the file /scratch.global/ben_testing/conga/conga/data/new_paired_tcr_db_for_matching_nr.tsv

P values of matches are assigned by turning the raw TCRdist score into a P value based on a model of the V(D)J rearrangement process, so matches between TCRs that are very far from germline (for example) are assigned a higher significance.

Columns:

tcrdist: TCRdist distance between the two TCRs (adata query and db hit)

pvalue_adj: raw P value of the match * num query TCRs * num db TCRs

fdr_value: Benjamini-Hochberg FDR value for match

clone_index: index within adata of the query TCR clonotype

db_index: index of the hit in the database being matched

va,ja,cdr3a,vb,jb,cdr3b

db_XXX: where XXX is a field in the literature database



tcr_graph_vs_gex_features


This table has results from a graph-vs-features analysis in which we look for genes that are differentially expressed (elevated) in specific neighborhoods of the TCR neighbor graph. Differential expression is assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a gene.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons log2enr = log2 fold change of gene in neighborhood (will be positive) gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the gene mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.


ttest_pvalue_adj mwu_pvalue_adj log2enr gex_cluster tcr_cluster feature mean_fg mean_bg num_fg clone_index mait_fraction nbr_frac graph_type feature_type
2.090120e-25 5.916646e-57 5.273710 0 6 ENSMMUG00000043894 2.511757 0.256801 60 -1 0.0 0.00 tcr_cluster gex
8.715793e-07 6.547524e-57 5.738599 0 0 ENSMMUG00000060662 1.129266 0.038458 82 732 0.0 0.10 tcr_nbr gex
1.053885e-06 8.416104e-57 5.690479 0 0 ENSMMUG00000060662 1.121724 0.039304 82 743 0.0 0.10 tcr_nbr gex
7.973059e-06 1.137236e-50 5.419721 0 0 ENSMMUG00000060662 1.077499 0.044265 82 762 0.0 0.10 tcr_nbr gex
7.674795e-06 1.196794e-50 5.423429 0 0 ENSMMUG00000060662 1.078124 0.044194 82 726 0.0 0.10 tcr_nbr gex
9.413289e-06 1.769440e-50 5.356692 0 0 ENSMMUG00000060662 1.066784 0.045467 82 723 0.0 0.10 tcr_nbr gex
2.129767e-05 1.421362e-47 5.239981 0 0 ENSMMUG00000060662 1.046544 0.047737 82 756 0.0 0.10 tcr_nbr gex
4.787865e-05 3.037410e-45 5.252560 0 0 ENSMMUG00000060662 1.048750 0.047489 82 729 0.0 0.10 tcr_nbr gex
5.366347e-05 4.196303e-45 5.211375 0 0 ENSMMUG00000060662 1.041506 0.048302 82 748 0.0 0.10 tcr_nbr gex
8.884374e-05 7.870720e-45 5.099001 0 0 ENSMMUG00000060662 1.021433 0.050554 82 747 0.0 0.10 tcr_nbr gex
6.157709e-05 1.158075e-44 5.097260 0 0 ENSMMUG00000060662 1.021119 0.050589 82 754 0.0 0.10 tcr_nbr gex
8.524660e-05 1.528985e-42 5.182886 0 0 ENSMMUG00000060662 1.036460 0.048868 82 758 0.0 0.10 tcr_nbr gex
1.171426e-04 2.777790e-42 5.077276 0 0 ENSMMUG00000060662 1.017502 0.050995 82 719 0.0 0.10 tcr_nbr gex
2.212358e-04 5.280523e-42 4.971283 0 0 ENSMMUG00000060662 0.998094 0.053172 82 722 0.0 0.10 tcr_nbr gex
1.348242e-04 5.448345e-42 4.983738 0 0 ENSMMUG00000060662 1.000394 0.052914 82 744 0.0 0.10 tcr_nbr gex
1.723799e-04 1.066420e-41 4.895979 1 0 ENSMMUG00000060662 0.984085 0.054743 82 725 0.0 0.10 tcr_nbr gex
4.024543e-04 8.720075e-41 6.614872 0 2 ENSMMUG00000056431 0.536290 0.007214 102 -1 0.0 0.00 tcr_cluster gex
1.356995e-01 3.107400e-40 5.828561 0 2 ENSMMUG00000056431 0.601254 0.014402 82 510 0.0 0.10 tcr_nbr gex
1.423645e-01 3.436002e-40 5.779345 0 2 ENSMMUG00000056431 0.597829 0.014787 82 526 0.0 0.10 tcr_nbr gex
1.474711e-01 3.576926e-40 5.769268 0 2 ENSMMUG00000056431 0.597119 0.014866 82 514 0.0 0.10 tcr_nbr gex
4.479074e-04 9.653405e-40 4.966489 0 0 ENSMMUG00000060662 0.997207 0.053271 82 728 0.0 0.10 tcr_nbr gex
2.414512e-04 1.831099e-39 4.951339 1 0 ENSMMUG00000060662 0.994401 0.053586 82 753 0.0 0.10 tcr_nbr gex
7.616958e+00 1.771152e-38 6.429089 2 2 ENSMMUG00000056431 1.764916 0.054660 9 520 0.0 0.01 tcr_nbr gex
1.194408e-07 2.543957e-38 5.144844 0 0 ENSMMUG00000060662 0.774745 0.032536 127 -1 0.0 0.00 tcr_cluster gex
5.298851e-04 2.653212e-37 4.986995 0 0 ENSMMUG00000060662 1.000994 0.052846 82 739 0.0 0.10 tcr_nbr gex
3.016541e-01 5.969024e-37 5.672463 0 2 ENSMMUG00000056431 0.590162 0.015647 82 520 0.0 0.10 tcr_nbr gex
1.043226e-03 7.480887e-37 4.813065 0 0 ENSMMUG00000060662 0.968459 0.056496 82 730 0.0 0.10 tcr_nbr gex
2.613080e-01 7.530056e-37 5.634874 0 2 ENSMMUG00000056431 0.587390 0.015958 82 525 0.0 0.10 tcr_nbr gex
1.526950e-03 8.799921e-37 4.756484 0 0 ENSMMUG00000060662 0.957681 0.057705 82 718 0.0 0.10 tcr_nbr gex
3.478362e-01 9.870896e-37 5.516429 0 2 ENSMMUG00000056431 0.578405 0.016966 82 495 0.0 0.10 tcr_nbr gex
9.901128e-04 1.081816e-36 4.789364 0 0 ENSMMUG00000060662 0.963955 0.057001 82 752 0.0 0.10 tcr_nbr gex
7.564623e-04 1.081816e-36 4.810202 0 0 ENSMMUG00000060662 0.967916 0.056557 82 715 0.0 0.10 tcr_nbr gex
1.119474e-03 1.253569e-36 4.757200 0 0 ENSMMUG00000060662 0.957818 0.057690 82 716 0.0 0.10 tcr_nbr gex
3.817980e-01 1.938810e-36 5.305863 0 2 ENSMMUG00000056431 0.561484 0.018864 82 519 0.0 0.10 tcr_nbr gex
1.416187e-03 5.044689e-36 5.149550 2 9 ENSMMUG00000061119 1.157679 0.059672 42 -1 0.0 0.00 tcr_cluster gex
2.352542e-03 4.149398e-34 4.629255 0 0 ENSMMUG00000060662 0.933123 0.060460 82 742 0.0 0.10 tcr_nbr gex
2.963591e-03 1.018014e-33 4.518643 0 0 ENSMMUG00000060662 0.911439 0.062892 82 741 0.0 0.10 tcr_nbr gex
6.930714e-01 1.376461e-33 5.401528 0 2 ENSMMUG00000056431 0.569321 0.017985 82 516 0.0 0.10 tcr_nbr gex
4.061500e-03 1.493501e-33 4.502807 0 0 ENSMMUG00000060662 0.908311 0.063243 82 746 0.0 0.10 tcr_nbr gex
7.234812e-01 1.510470e-33 5.358166 0 2 ENSMMUG00000056431 0.565799 0.018380 82 504 0.0 0.10 tcr_nbr gex
6.293213e-01 1.688505e-33 5.377382 0 2 ENSMMUG00000056431 0.567366 0.018204 82 518 0.0 0.10 tcr_nbr gex
5.959907e-03 1.954904e-33 4.433673 0 0 ENSMMUG00000060662 0.894590 0.064782 82 720 0.0 0.10 tcr_nbr gex
7.324311e-01 2.539074e-33 5.266582 0 2 ENSMMUG00000056431 0.558195 0.019233 82 497 0.0 0.10 tcr_nbr gex
8.059679e-01 3.477800e-33 5.148103 0 2 ENSMMUG00000056431 0.548026 0.020373 82 517 0.0 0.10 tcr_nbr gex
1.194977e-06 4.672222e-33 4.957800 0 1 ENSMMUG00000065017 0.845550 0.041883 106 -1 0.0 0.00 tcr_cluster gex
8.777602e-04 3.162081e-32 4.794910 0 1 ENSMMUG00000065017 0.954992 0.055992 82 69 0.0 0.10 tcr_nbr gex
9.507545e-03 2.590953e-31 4.419243 0 0 ENSMMUG00000060662 0.891714 0.065105 82 749 0.0 0.10 tcr_nbr gex
2.637064e-03 6.573415e-30 4.662341 0 1 ENSMMUG00000065017 0.929815 0.058816 82 53 0.0 0.10 tcr_nbr gex
1.981927e-03 6.660292e-30 4.666669 0 1 ENSMMUG00000065017 0.930644 0.058723 82 33 0.0 0.10 tcr_nbr gex
3.979837e-03 7.019358e-30 4.623448 0 1 ENSMMUG00000065017 0.922339 0.059655 82 67 0.0 0.10 tcr_nbr gex
Omitted 181 lines

tcr_graph_vs_gex_features_plot


This plot summarizes the results of a graph versus features analysis by labeling the clonotypes at the center of each biased neighborhood with the name of the feature biased in that neighborhood. The feature names are drawn in colored boxes whose color is determined by the strength and direction of the feature score bias (from bright red for features that are strongly elevated to bright blue for features that are strongly decreased in the corresponding neighborhoods, relative to the rest of the dataset).

At most one feature (the top scoring) is shown for each clonotype (ie, neighborhood). The UMAP xy coordinates for this plot are stored in adata.obsm['X_tcr_2d']. The score used for ranking correlations is 'mwu_pvalue_adj'. The threshold score for displaying a feature is 1.0. The feature column is 'feature'. Since we also run graph-vs-features using "neighbor" graphs that are defined by clusters, ie where each clonotype is connected to all the other clonotypes in the same cluster, some biased features may be associated with a cluster rather than a specific clonotype. Those features are labeled with a '*' at the end and shown near the centroid of the clonotypes belonging to that cluster.
Image source: Rotelle_PBMC_Final2_tcr_graph_vs_gex_features_plot.png

tcr_graph_vs_gex_features_panels


Graph-versus-feature analysis was used to identify a set of GEX features that showed biased distributions in TCR neighborhoods. This plot shows the distribution of the top-scoring GEX features on the TCR UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons). At most 3 features from clonotype neighbhorhoods in each (GEX,TCR) cluster pair are shown. The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel. Points are plotted in order of increasing feature score.
Image source: Rotelle_PBMC_Final2_tcr_graph_vs_gex_features_panels.png

tcr_genes_vs_gex_features


This table has results from a graph-vs-features analysis in which we look for genes that are differentially expressed (elevated) in specific neighborhoods of the TCR neighbor graph. Differential expression is assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a gene.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons log2enr = log2 fold change of gene in neighborhood (will be positive) gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the gene mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.

In this analysis the TCR graph is defined by connecting all clonotypes that have the same VA/JA/VB/JB-gene segment (it's run four times, once with each gene segment type)
ttest_pvalue_adj mwu_pvalue_adj log2enr gex_cluster tcr_cluster feature mean_fg mean_bg num_fg clone_index mait_fraction gene_segment graph_type feature_type
2.023097e-15 2.381494e-158 10.311670 0 2 ENSMMUG00000056431 1.637157 0.003253 35 -1 0.000000 TRAV35 tcr_genes gex
1.399801e-08 1.545821e-147 9.845546 1 2 ENSMMUG00000059325 1.660134 0.004620 23 -1 0.000000 TRAV25 tcr_genes gex
4.684207e-06 3.240195e-141 10.135307 0 5 ENSMMUG00000054409 1.964874 0.005439 22 -1 0.000000 TRAV6 tcr_genes gex
4.506308e-25 4.386109e-132 8.034113 0 0 ENSMMUG00000060662 2.061050 0.025813 49 -1 0.000000 TRAV8-7 tcr_genes gex
1.283064e-05 1.733129e-129 8.543027 0 2 ENSMMUG00000052673 1.300378 0.007135 25 -1 0.000000 TRAV27 tcr_genes gex
3.772965e-18 4.317129e-123 9.228719 0 4 ENSMMUG00000063185 2.474131 0.017958 34 -1 0.000000 TRBV4-2 tcr_genes gex
4.285052e+00 2.682576e-119 11.431689 0 11 ENSMMUG00000049767 2.721227 0.005127 9 -1 0.000000 TRBV5-8 tcr_genes gex
9.712109e-19 4.099275e-119 8.929392 4 7 ENSMMUG00000062211 2.819113 0.031818 35 -1 0.000000 TRBV12-2 tcr_genes gex
3.293600e-29 1.125582e-116 7.955469 0 1 ENSMMUG00000065017 2.141128 0.029803 45 -1 0.000000 TRAV12-1 tcr_genes gex
1.647626e-19 2.953249e-107 7.966247 0 0 ENSMMUG00000062085 2.334548 0.036609 36 -1 0.111111 TRBV4-3 tcr_genes gex
8.582740e-11 3.293508e-103 7.523616 3 0 ENSMMUG00000056910 1.747838 0.025445 27 -1 0.000000 TRAV16 tcr_genes gex
3.981197e-01 6.169407e-101 9.684179 0 10 ENSMMUG00000062897 2.429090 0.012501 10 -1 0.000000 TRBV11-2 tcr_genes gex
5.449445e-03 1.555224e-81 6.326424 0 0 ENSMMUG00000061081 0.805395 0.015304 28 -1 0.000000 TRAV8-2 tcr_genes gex
1.588280e-11 1.563114e-71 6.755727 1 9 ENSMMUG00000061119 1.915783 0.052214 28 -1 0.000000 TRAV18 tcr_genes gex
4.251305e-02 8.464781e-71 6.420656 0 0 ENSMMUG00000057062 1.044101 0.021261 20 -1 0.000000 TRAV8-3 tcr_genes gex
1.083661e-43 3.342600e-65 5.629427 1 6 ENSMMUG00000043894 2.703696 0.248030 58 -1 0.000000 TRBV20-1 tcr_genes gex
1.293579e-24 7.840781e-42 4.798228 1 8 ENSMMUG00000043894 2.407015 0.309711 44 -1 0.000000 TRBV19 tcr_genes gex
2.018880e-05 6.078189e-40 6.570135 1 6 ENSMMUG00000062974 2.011605 0.065926 17 -1 0.000000 TRAV13-2 tcr_genes gex
1.405656e-22 1.821342e-32 4.723584 1 5 ENSMMUG00000056515 2.606042 0.388545 42 -1 0.000000 TRBV6-3 tcr_genes gex
2.024391e-08 1.155394e-22 4.578814 4 4 ENSMMUG00000043894 2.421276 0.357209 26 -1 0.000000 TRBV21-1 tcr_genes gex
1.235067e-11 7.116288e-22 4.350750 2 0 ENSMMUG00000056515 2.462766 0.422809 32 -1 0.000000 TRBV6-2 tcr_genes gex
2.611869e+00 1.337666e-18 6.595482 0 10 ENSMMUG00000051385 2.633613 0.125440 9 -1 0.000000 TRBV7-4 tcr_genes gex
6.607468e+00 1.543535e-18 5.787170 2 4 ENSMMUG00000062211 2.188966 0.134120 7 -1 0.000000 TRBV12-3 tcr_genes gex
5.608946e-01 2.712174e-17 5.568955 0 0 ENSMMUG00000051385 2.041498 0.132068 9 -1 0.000000 TRBV5-6 tcr_genes gex
1.911247e+00 2.969819e-16 6.604710 2 0 ENSMMUG00000051385 2.661682 0.128277 8 -1 0.000000 TRBV7-6 tcr_genes gex
4.017638e-11 3.285419e-16 4.215806 3 9 ENSMMUG00000056515 2.427408 0.442052 25 -1 0.000000 TRBV10-2 tcr_genes gex
8.862014e-07 3.073363e-13 2.629011 0 1 ENSMMUG00000056515 1.512360 0.452248 39 -1 0.000000 TRBV9 tcr_genes gex
3.682711e-05 1.467836e-11 2.640281 0 11 ENSMMUG00000056515 1.531249 0.458228 34 -1 0.125000 TRBV10-1 tcr_genes gex
1.592875e+00 2.545340e-04 2.197282 2 11 CD8A 1.192613 0.405844 35 -1 0.000000 TRAV4 tcr_genes gex
1.389609e-03 4.956171e-04 1.073736 0 1 ENSMMUG00000055756 1.386624 0.886198 79 -1 0.000000 TRBJ1-4 tcr_genes gex
3.884116e+00 1.013798e-03 2.454590 2 11 ENSMMUG00000003532 1.662382 0.576226 35 -1 0.000000 TRAV4 tcr_genes gex
5.578509e+00 9.450066e-03 1.864993 2 4 ENSMMUG00000003532 1.326029 0.564945 62 -1 0.000000 TRAV19 tcr_genes gex
9.741658e+00 3.183311e-02 2.079100 1 0 SP3 0.845704 0.273586 16 -1 0.000000 TRAJ47 tcr_genes gex
6.407735e+00 2.282157e-01 3.235854 0 7 AP2A1 0.714843 0.105084 4 -1 0.000000 TRBV11-3 tcr_genes gex
4.442028e+00 2.585186e-01 1.114785 0 6 ENSMMUG00000059019 0.751468 0.416881 77 -1 0.000000 TRBJ1-2 tcr_genes gex

tcr_genes_vs_gex_features_panels


Graph-versus-feature analysis was used to identify a set of GEX features that showed biased distributions in TCR neighborhoods. This plot shows the distribution of the top-scoring GEX features on the TCR UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons). At most 3 features from clonotype neighbhorhoods in each (GEX,TCR) cluster pair are shown. The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel. Points are plotted in order of increasing feature score.
Image source: Rotelle_PBMC_Final2_tcr_genes_vs_gex_features_panels.png

gex_graph_vs_tcr_features


This table has results from a graph-vs-features analysis in which we look at the distribution of a set of TCR-defined features over the GEX neighbor graph. We look for neighborhoods in the graph that have biased score distributions, as assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a tcr feature.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons ttest_stat= ttest statistic (sign indicates where feature is up or down) mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the TCR score mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.


nbr_frac graph_type ttest_pvalue_adj ttest_stat mwu_pvalue_adj gex_cluster tcr_cluster num_fg mean_fg mean_bg feature mait_fraction clone_index feature_type
0.0 gex_cluster 4.033806e-02 4.208120 1.018345e-07 2.0 11.0 163.0 0.134969 0.020000 TRAV4 0.0 -1.0 tcr
0.0 gex_cluster 2.007759e-07 6.609655 4.315838e-07 2.0 4.0 163.0 0.196514 -0.041545 cd8 0.0 -1.0 tcr
0.0 gex_cluster 1.424384e-01 3.878018 2.775939e-04 2.0 4.0 163.0 0.171779 0.052308 TRAV19 0.0 -1.0 tcr
0.1 gex_nbr 7.747291e-02 5.317469 1.337488e-01 2.0 4.0 82.0 0.224805 -0.018340 cd8 0.0 325.0 tcr
0.1 gex_nbr 1.960297e-01 5.103595 1.424349e-01 2.0 4.0 82.0 0.218594 -0.017643 cd8 0.0 627.0 tcr
0.1 gex_nbr 9.123863e-02 5.289567 1.718738e-01 2.0 4.0 82.0 0.233981 -0.019370 cd8 0.0 296.0 tcr
0.0 gex_cluster 5.239862e-02 -4.079208 1.889135e-01 0.0 6.0 237.0 -0.090594 0.046004 cd8 0.0 -1.0 tcr
0.0 gex_cluster 3.832472e+00 2.916780 1.982995e-01 1.0 4.0 184.0 0.108696 0.038156 TRBV19 0.0 -1.0 tcr
0.1 gex_nbr 1.907316e-01 5.115913 2.631118e-01 2.0 4.0 82.0 0.225681 -0.018438 cd8 0.0 566.0 tcr
0.1 gex_nbr 1.012477e-01 5.255064 2.866805e-01 2.0 4.0 82.0 0.221720 -0.017994 cd8 0.0 186.0 tcr
0.1 gex_nbr 1.160989e-01 5.216863 3.400443e-01 2.0 4.0 82.0 0.213995 -0.017128 cd8 0.0 570.0 tcr
0.1 gex_nbr 3.695361e-01 4.957319 6.241432e-01 2.0 4.0 82.0 0.217809 -0.017555 cd8 0.0 328.0 tcr
0.1 gex_nbr 4.724420e-01 4.901521 7.226272e-01 2.0 4.0 82.0 0.219849 -0.017784 cd8 0.0 494.0 tcr
0.1 gex_nbr 3.966239e-01 4.934160 8.697994e-01 2.0 4.0 82.0 0.209633 -0.016638 cd8 0.0 451.0 tcr
0.1 gex_nbr 6.992939e-01 4.795477 1.074876e+00 2.0 4.0 82.0 0.202160 -0.015800 cd8 0.0 394.0 tcr
0.1 gex_nbr 5.018388e-01 4.880069 1.154740e+00 2.0 4.0 82.0 0.210390 -0.016723 cd8 0.0 555.0 tcr
0.1 gex_nbr 7.200709e-01 4.787134 1.168141e+00 2.0 4.0 82.0 0.200439 -0.015607 cd8 0.0 456.0 tcr
0.0 gex_cluster 6.674994e-02 -4.007018 1.272531e+00 0.0 2.0 237.0 0.029536 0.095486 TRAV19 0.0 -1.0 tcr
0.1 gex_nbr 7.293143e-01 4.796122 1.545151e+00 2.0 4.0 82.0 0.215954 -0.017347 cd8 0.0 221.0 tcr
0.0 gex_cluster 1.885184e-02 -4.300044 1.800956e+00 0.0 2.0 237.0 0.008439 0.057292 TRAV4 0.0 -1.0 tcr
0.0 gex_cluster 1.183422e-02 -4.422689 3.355746e+00 3.0 2.0 153.0 0.019608 0.089394 TRAV19 0.0 -1.0 tcr
0.0 gex_cluster 3.908126e-02 -4.140088 7.073737e+00 2.0 11.0 163.0 0.012270 0.066154 TRAV12-1 0.0 -1.0 tcr
0.0 gex_cluster 3.447201e-01 -3.602369 9.329703e+00 2.0 11.0 163.0 0.024540 0.083077 TRBV20-1 0.0 -1.0 tcr

gex_graph_vs_tcr_features_plot


This plot summarizes the results of a graph versus features analysis by labeling the clonotypes at the center of each biased neighborhood with the name of the feature biased in that neighborhood. The feature names are drawn in colored boxes whose color is determined by the strength and direction of the feature score bias (from bright red for features that are strongly elevated to bright blue for features that are strongly decreased in the corresponding neighborhoods, relative to the rest of the dataset).

At most one feature (the top scoring) is shown for each clonotype (ie, neighborhood). The UMAP xy coordinates for this plot are stored in adata.obsm['X_gex_2d']. The score used for ranking correlations is 'mwu_pvalue_adj'. The threshold score for displaying a feature is 1.0. The feature column is 'feature'. Since we also run graph-vs-features using "neighbor" graphs that are defined by clusters, ie where each clonotype is connected to all the other clonotypes in the same cluster, some biased features may be associated with a cluster rather than a specific clonotype. Those features are labeled with a '*' at the end and shown near the centroid of the clonotypes belonging to that cluster.
Image source: Rotelle_PBMC_Final2_gex_graph_vs_tcr_features_plot.png

gex_graph_vs_tcr_features_panels


Graph-versus-feature analysis was used to identify a set of TCR features that showed biased distributions in GEX neighborhoods. This plot shows the distribution of the top-scoring TCR features on the GEX UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons). At most 3 features from clonotype neighbhorhoods in each (GEX,TCR) cluster pair are shown. The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel. Points are plotted in order of increasing feature score.
Image source: Rotelle_PBMC_Final2_gex_graph_vs_tcr_features_panels.png

graph_vs_features_gex_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the GEX landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_gex' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are GEX clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=81 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie GEX features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the GEX features).


Image source: Rotelle_PBMC_Final2_graph_vs_features_gex_clustermap.png

graph_vs_features_tcr_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the TCR landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_tcr' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are TCR clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=81 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie TCR features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the TCR features).


Image source: Rotelle_PBMC_Final2_graph_vs_features_tcr_clustermap.png

graph_vs_summary


Summary figure for the graph-vs-graph and graph-vs-features analyses.
Image source: Rotelle_PBMC_Final2_graph_vs_summary.png

gex_clusters_tcrdist_trees


These are TCRdist hierarchical clustering trees for the GEX clusters (cluster assignments stored in adata.obs['clusters_gex']). The trees are colored by CoNGA score with a color score range of 8.13e+00 (blue) to 8.13e-09 (red). For coloring, CoNGA scores are log-transformed, negated, and square-rooted (with an offset in there, too, roughly speaking).
Image source: Rotelle_PBMC_Final2_gex_clusters_tcrdist_trees.png

conga_threshold_tcrdist_tree


This is a TCRdist hierarchical clustering tree for the clonotypes with CoNGA score less than 10.0. The tree is colored by CoNGA score with a color score range of 1.00e+01 (blue) to 1.00e-08 (red). For coloring, CoNGA scores are log-transformed, negated, and square-rooted (with an offset in there, too, roughly speaking).
Image source: Rotelle_PBMC_Final2_conga_threshold_tcrdist_tree.png

hotspot_features


Find GEX (TCR) features that show a biased distribution across the TCR (GEX) neighbor graph, using a simplified version of the Hotspot method from the Yosef lab.

DeTomaso, D., & Yosef, N. (2021). "Hotspot identifies informative gene modules across modalities of single-cell genomics." Cell Systems, 12(5), 446–456.e9.

PMID:33951459

Columns:

Z: HotSpot Z statistic

pvalue_adj: Raw P value times the number of tests (crude Bonferroni correction)

nbr_frac: The K NN nbr fraction used for the neighbor graph construction (nbr_frac = 0.1 means K=0.1*num_clonotypes neighbors)


Z pvalue_adj feature feature_type nbr_frac
62.574502 0.000000e+00 ENSMMUG00000060662 gex 0.10
55.160550 0.000000e+00 ENSMMUG00000043894 gex 0.10
50.847794 0.000000e+00 ENSMMUG00000065017 gex 0.10
46.003959 0.000000e+00 ENSMMUG00000056431 gex 0.10
41.382420 0.000000e+00 ENSMMUG00000056515 gex 0.10
39.856341 0.000000e+00 ENSMMUG00000060662 gex 0.01
35.240433 2.900973e-268 ENSMMUG00000056431 gex 0.01
31.210442 4.639250e-210 ENSMMUG00000043894 gex 0.01
29.647833 2.182252e-189 ENSMMUG00000065017 gex 0.01
29.379150 6.119268e-186 ENSMMUG00000061119 gex 0.10
27.784939 3.979716e-166 ENSMMUG00000062211 gex 0.10
27.500087 1.056724e-162 ENSMMUG00000059325 gex 0.01
26.625873 2.055230e-152 ENSMMUG00000061119 gex 0.01
25.960956 8.251805e-145 ENSMMUG00000059325 gex 0.10
24.120108 9.280016e-125 ENSMMUG00000056910 gex 0.01
24.094024 1.742227e-124 ENSMMUG00000052673 gex 0.10
23.892320 2.220853e-122 ENSMMUG00000056910 gex 0.10
23.280648 4.201416e-116 ENSMMUG00000063185 gex 0.10
22.894643 3.169810e-112 ENSMMUG00000054409 gex 0.10
22.495911 2.746059e-108 ENSMMUG00000062085 gex 0.10
21.996424 1.880022e-103 ENSMMUG00000056515 gex 0.01
20.818249 1.784714e-92 ENSMMUG00000061081 gex 0.10
19.476191 1.054729e-80 ENSMMUG00000052673 gex 0.01
19.140492 7.009011e-78 ENSMMUG00000054409 gex 0.01
16.885223 3.487213e-60 ENSMMUG00000062211 gex 0.01
16.758314 2.970861e-59 ENSMMUG00000061081 gex 0.01
14.398764 3.178281e-43 ENSMMUG00000057062 gex 0.10
12.104773 6.012092e-30 ENSMMUG00000063185 gex 0.01
11.356344 6.069473e-28 mait tcr 0.01
10.571631 3.550069e-24 cd8 tcr 0.10
10.864556 1.025987e-23 ENSMMUG00000062085 gex 0.01
10.851013 1.189971e-23 ENSMMUG00000057062 gex 0.01
10.423786 1.699367e-23 tcr_cluster11 tcr 0.10
10.423786 1.699367e-23 TRAV4 tcr 0.10
10.572199 2.419327e-22 CD8A gex 0.10
10.239071 7.994520e-21 ENSMMUG00000003532 gex 0.10
9.717094 1.538259e-18 gex_cluster2 gex 0.10
8.425956 2.158882e-13 ENSMMUG00000062974 gex 0.01
7.995676 7.775088e-12 ENSMMUG00000062974 gex 0.10
7.792790 3.954679e-11 gex_cluster2 gex 0.01
7.768628 4.787011e-11 ENSMMUG00000003532 gex 0.01
7.661030 1.112868e-10 CD8A gex 0.01
7.482907 4.386117e-10 CPA6 gex 0.01
7.466271 4.977590e-10 ENSMMUG00000059367 gex 0.01
6.629539 2.963280e-09 TRAV19 tcr 0.10
6.833487 5.000253e-08 CTSW gex 0.10
6.718754 1.105852e-07 KLRB1 gex 0.01
5.849362 4.342479e-07 tcr_cluster4 tcr 0.10
6.404303 9.114078e-07 RORC gex 0.01
5.500487 3.332938e-06 cd8 tcr 0.01
Omitted 21 lines

hotspot_gex_umap


HotSpot analysis (Nir Yosef lab, PMID: 33951459) was used to identify a set of GEX (TCR) features that showed biased distributions in TCR (GEX) space. This plot shows the distribution of the top-scoring HotSpot features on the GEX UMAP 2D landscape. The features are ranked by adjusted P value (raw P value * number of comparisons). The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel.

Features are filtered based on correlation coefficient to reduce redundancy: if a feature has a correlation of >= 0.9 (the max_feature_correlation argument to conga.plotting.plot_hotspot_umap) to a previously plotted feature, that feature is skipped. Points are plotted in order of increasing feature score
Image source: Rotelle_PBMC_Final2_hotspot_combo_features_0.100_nbrs_gex_plot_umap_nbr_avg.png

hotspot_gex_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the GEX landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_gex' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are GEX clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=81 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie GEX features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the GEX features).


Image source: Rotelle_PBMC_Final2_hotspot_combo_features_0.100_nbrs_gex_plot_clustermap_nbr_avg.png

hotspot_tcr_umap


HotSpot analysis (Nir Yosef lab, PMID: 33951459) was used to identify a set of GEX (TCR) features that showed biased distributions in TCR (GEX) space. This plot shows the distribution of the top-scoring HotSpot features on the TCR UMAP 2D landscape. The features are ranked by adjusted P value (raw P value * number of comparisons). The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel.

Features are filtered based on correlation coefficient to reduce redundancy: if a feature has a correlation of >= 0.9 (the max_feature_correlation argument to conga.plotting.plot_hotspot_umap) to a previously plotted feature, that feature is skipped. Points are plotted in order of increasing feature score
Image source: Rotelle_PBMC_Final2_hotspot_combo_features_0.100_nbrs_tcr_plot_umap_nbr_avg.png

hotspot_tcr_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the TCR landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_tcr' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are TCR clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=81 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie TCR features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the TCR features).


Image source: Rotelle_PBMC_Final2_hotspot_combo_features_0.100_nbrs_tcr_plot_clustermap_nbr_avg.png